The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeoff

نویسندگان

  • Zhengyan He
  • Houfeng Wang
  • Sujian Li
چکیده

The CIPS-SIGHAN 2012 Chinese Named Entity Recognition and Disambiguation (NERD) bake-off was held in the summer of 2012. Named entity recognition and disambiguation is an important task in natural language processing and knowledge base construction. It aims at detecting entity mentions in raw text, followed by pointing the detected mentions to real world entities. Often, real world entities can be found on online encyclopedia like Wikipedia and Baike. This task focuses on NERD in Chinese Language, and presents some challenges unique to Chinese, namely the confusion of named entity with common words, and lack of capital clues as in English. We manually construct query names and a knowledge base from Baike. Evaluation results show promising future of this field.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Chinese Named Entity Recognition and Disambiguation System

In this paper we describe an integrated approach for named entity recognition and disambiguation in Chinese. The proposed method relies on named entity recognition (NER), entity linking and document clustering models. Different from other tasks of named entities, both classification and clustering are considered in our models. After segmentation, information extraction and indexing in the prepr...

متن کامل

SIR-NERD: A Chinese Named Entity Recognition and Disambiguation System using a Two-Stage Method

This paper presents our SIR-NERD system for the Chinese named entity recognition and disambiguation Task in the CIPS-SIGHAN joint conference on Chinese language processing (CLP2012). Our system uses a two-stage method and some key techniques to deal with the named entity recognition and disambiguation (NERD) task. Experimental results on the test data shows that the proposed system, which incor...

متن کامل

Attribute based Chinese Named Entity Recognition and Disambiguation

In this paper, we briefly report our system for Chinese Named Entity Recognition and Disambiguation task in CIPS-SIGHAN joint conference. We first present a method to extract different types of target person attributes from text documents with multiple techniques. Then we use these attributes to disambiguate different entities. Finally a classifier is used to distinguish entities in the knowled...

متن کامل

Chinese Personal Name Disambiguation Based on Vector Space Model

This paper introduces the task of Chinese personal name disambiguation of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP) 2012 that Natural Language Processing Laboratory of Zhengzhou University took part in. In this task, we mainly use the Vector Space Model to disambiguate Chinese personal name. We extract different named entity features from diverse names informa...

متن کامل

The CIPS-SIGHAN CLP 2012 ChineseWord Segmentation onMicroBlog Corpora Bakeoff

The CIPS-SIGHAN CLP 2012 Chinese Word Segmentation on MicroBlog Corpora Bakeoff was held in the autumn of 2012. This bake-off task of Chinese word segmentation is focused on the performance of Chinese word segmentation algorithms on MicroBlog corpora. 17 groups submitted 20 results, among which the best system has all the P, R and F values near 95%, and the average values of the 17 systems are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012